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Abstract — The  problem  of  anomaly  localization  in  a  resource- 
constrained  cyber  system  is  considered.  Each  anomalous  compo¬ 
nent  of  the  system  incurs  a  cost  per  unit  time  until  its  anomaly  is 
identified  and  fixed.  Different  anomalous  components  may  incur 
different  costs  depending  on  their  criticality  to  the  system.  Due  to 
resource  constraints,  only  one  component  can  be  probed  at  each 
given  time.  The  observations  from  a  probed  component  are  realiza¬ 
tions  drawn  from  two  different  distributions  depending  on  whether 
the  component  is  normal  or  anomalous.  The  objective  is  a  probing 
strategy  that  minimizes  the  total  expected  cost,  incurred  by  all  the 
components  during  the  detection  process,  under  reliability  con¬ 
straints.  We  consider  both  independent  and  exclusive  models.  In 
the  former,  each  component  can  be  abnormal  with  a  certain  prob¬ 
ability  independent  of  other  components.  In  the  latter,  one  and  only 
one  component  is  abnormal.  We  develop  optimal  index  policies 
under  both  models.  The  proposed  index  policies  apply  to  a  more 
general  case  where  a  subset  (more  than  one)  of  the  components 
can  be  probed  simultaneously.  The  problem  under  study  also  finds 
applications  in  spectrum  scanning  in  cognitive  radio  networks  and 
event  detection  in  sensor  networks. 

Index  Terms — ^Anomaly  localization,  composite  hypothesis 
testing,  sequential  hypothesis  testing,  sequential  probability  ratio 
test  (SPRT). 


I.  Introduction 

CONSIDER  a  cyber  system  with  K  components.  Each 
component  may  be  in  a  normal  or  an  abnormal  state.  If 
abnormal,  component  k  incurs  a  cost  Ck  per  unit  time  until  its 
anomaly  is  identified  and  fixed.  Due  to  resource  constraints, 
only  one  component  can  be  probed  at  a  time,  and  switching 
to  a  different  component  is  allowed  only  when  the  state  of 
the  currently  probed  component  is  declared.  The  observations 
from  a  probed  component  (say  k)  follow  distributions 
or  depending  on  whether  the  component  is  normal  or 
anomalous,  respectively.  The  objective  is  a  probing  strategy 
that  dynamically  determines  the  order  of  the  sequential  tests 
performed  on  all  the  components  so  that  the  total  cost  incurred 
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to  the  system  during  the  entire  detection  process  is  minimized 
under  reliability  constraints. 

A.  Main  Results 

The  above  problem  presents  an  interesting  twist  to  the  classic 
sequential  hypothesis  testing  problem.  In  the  case  when  there  is 
only  one  component,  minimizing  the  cost  is  equivalent  to  min¬ 
imizing  the  detection  delay,  and  the  problem  is  reduced  to  a 
classic  sequential  test  where  both  the  simple  and  the  composite 
hypothesis  cases  have  been  well  studied.  With  multiple  compo¬ 
nents,  however,  minimizing  the  detection  delay  of  each  com¬ 
ponent  is  no  longer  sufficient.  The  key  to  minimizing  the  total 
cost  is  the  order  at  which  the  components  are  being  tested.  It  is 
intuitive  that  we  should  prioritize  components  that  incur  higher 
costs  when  abnormal  as  well  as  components  with  higher  prior 
probabilities  of  being  abnormal.  Another  parameter  that  plays 
a  role  in  the  total  system  cost  is  the  expected  time  in  detecting 
the  state  of  a  component,  which  depends  on  the  observation  dis¬ 
tributions  {ff,  ff}-  It  is  desirable  to  place  components  that 
require  longer  testing  time  toward  the  end  of  the  testing  process. 
The  challenge  here  is  how  to  balance  these  parameters  in  the  dy¬ 
namic  probing  strategy. 

We  show  in  this  paper  that  the  optimal  probing  strategy  is  an 
open-loop  policy  where  the  testing  order  can  be  predetermined, 
independent  ofthe  realizations  ofeach  individual  test  in  terms  of 
both  the  test  outcome  and  the  detection  time.  Furthermore,  the 
probing  order  is  given  by  a  simple  index.  Specifically,  under  the 
independent  model  where  each  component  is  abnormal  with  a 
prior  probability  tt^  independent  of  other  components,  the  index 
is  in  the  formof7rfcCfc/E((Vfc),  where  E(7Vfe)  is  the  expected  de¬ 
tection  time  for  component  k.  Under  the  exclusive  model  where 
one  and  only  one  component  is  abnormal,  the  index  is  in  the 
form  of  7rfcCfc/E(Afc|iJo)  where  E(Afc|iTo)  is  the  expected  de¬ 
tection  time  for  component  k  under  the  hypothesis  of  it  being 
normal.  These  index  forms  give  a  clean  expression  on  how  the 
three  key  parameters — the  cost,  the  prior  probability,  and  the 
difficulty  in  distinguish  normal  distribution  ff  from  abnormal 
distribution  ff — are  balanced  in  choosing  the  probing  order. 
Furthermore,  it  is  interesting  to  notice  the  difference  in  the  in¬ 
dices  for  these  two  models.  Intuitively  speaking,  under  the  in¬ 
dependent  model,  the  detection  time  of  any  component,  normal 
or  abnormal,  adds  to  the  delay  in  catching  the  next  abnormal 
component,  while  under  the  exclusive  model,  only  the  detec¬ 
tion  times  of  components  in  a  normal  state  adds  to  the  delay  in 
catching  the  abnormal  component. 

The  above  simple  index  forms  of  the  probing  order  are 

(O')  ('1')  -R 

optimal  for  both  the  simple  hypothesis  ({/;;  ' ,  are 

(n't  (1  K 

known)  and  the  composite  hypothesis  ({/;;  \ 
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unknown  parameters)  cases.  These  index  policies  also  apply 
to  the  case  where  multiple  components  can  be  probed  simul¬ 
taneously.  While  the  optimality  of  these  indices  in  this  case 
remains  open,  simulation  examples  demonstrate  their  strong 
performance. 

B.  Applications 

The  problem  considered  here  finds  applications  in  anomaly 
detection  in  cyber  systems,  spectrum  scanning  in  cognitive  radio 
systems,  and  event  detection  in  sensor  networks.  In  the  fol¬ 
lowing  we  give  two  specific  examples. 

Consider  a  cyber  network  consisting  of  K  components 
(which  can  be  routers,  paths,  etc.).  Due  to  resource  constraints, 
only  a  subset  of  the  components  can  be  probed  at  a  time.  An 
Intrusion  Detection  System  (IDS)  analyzes  the  traffic  over  the 
components  to  detect  Denial  of  Service  (DoS)  attacks  (such 
attacks  aim  to  overwhelm  the  component  with  useless  traffic  to 
make  it  unavailable  for  its  intended  use).  Let  the  cost  Ck  be  the 
normal  expected  traffic  (packets  per  unit  time)  over  component 
k.  Thus,  in  this  example  minimizing  the  total  expected  cost 
minimizes  the  total  expected  number  of  failed  packets  in  the 
network  during  DoS  attacks.  The  exclusive  model  applies  to 
cases  where  an  intrusion  to  a  subnet,  consisting  of  K  compo¬ 
nents,  has  been  detected  and  the  probability  of  each  component 
being  compromised  is  small  (thus  with  high  probability,  there 
is  only  one  abnormal  component). 

Another  example  is  spectrum  sensing  in  cognitive  radio  sys¬ 
tems.  Consider  a  spectrum  consisting  of  K  orthogonal  chan¬ 
nels.  Accessing  an  idle  channel  leads  to  a  successful  transmis¬ 
sion,  while  accessing  a  busy  channel  results  in  a  collision  with 
other  users.  A  Cognitive  Radio  (CR)  is  an  intelligent  device  that 
can  detect  and  access  idle  channels  in  the  wireless  spectrum  [1]. 
Due  to  resource  constraints,  only  a  subset  of  the  channels  can 
be  sensed  at  a  time.  Once  a  channel  is  identified  as  idle,  the  CR 
transmits  over  it.  Let  Ck  be  the  achievable  rate  over  channel  k. 
Thus,  in  this  example  minimizing  the  total  expected  cost  min¬ 
imizes  the  total  expected  loss  in  data  rate  during  the  spectrum 
sensing  process. 

C.  Related  Work 

The  classic  sequential  hypothesis  testing  problem  which  pi¬ 
oneered  by  Wald  [2]  considers  only  a  single  stochastic  process. 
For  simple  binary  hypothesis  testing,  Wald  showed  that  the 
Sequential  Probability  Ratio  Test  (SPRT)  is  optimal  in  terms 
of  minimizing  the  expected  sample  size  under  given  type  I 
and  type  II  error  probability  constraints.  Various  extensions 
for  M-ary  hypothesis  testing  and  composite  hypothesis  testing 
were  studied  in  [3]-[9]  for  a  single  process.  In  these  cases, 
asymptotically  optimal  performance  can  be  obtained  as  the 
error  probability  approaches  zero. 

A  number  of  studies  exist  in  the  literature  that  consider 
sequential  detection  over  multiple  processes.  Differing  from 
this  work  that  focuses  on  minimizing  the  total  cost  incurred 
by  anomalous  components,  these  existing  results  adopt  the 
objective  of  minimizing  the  total  detection  delay.  In  particular, 
the  problem  of  quickly  detecting  an  idle  period  over  multiple 
independent  ON/OFF  processes  was  considered  in  [10]  where 
a  threshold  policy  was  shown  to  be  optimal.  The  ON/OFF 


nature  of  the  processes  and  the  objective  of  minimizing  the 
total  detection  delay  make  the  problems  considered  in  [10] 
fundamentally  different  from  the  one  considered  in  this  work. 
In  [11],  the  problem  of  quickest  detection  of  the  emergence 
of  primary  users  in  multi-channel  cognitive  radio  networks 
was  considered.  In  [12],  the  problem  of  quickest  detection  of 
idle  channels  over  K  independent  channels  was  studied.  The 
idle/busy  state  of  each  channel  was  assumed  fixed  over  time, 
and  the  objective  was  to  minimize  the  detection  delay  under 
error  constraints.  It  was  shown  that  the  optimal  policy  is  to  carry 
out  an  independent  SPRT  over  each  channel,  irrespective  of  the 
testing  order.  In  contrast  to  [12],  we  show  in  this  paper  that  the 
optimal  policy  in  our  model  highly  depends  on  the  testing  order 
even  when  the  processes  are  independent.  In  [13],  the  problem 
of  identifying  the  first  abnormal  sequence  among  an  infinite 
number  of  i.i.d  sequences  was  considered.  An  optimal  cumu¬ 
lative  sum  (CUSUM)  test  was  established  under  this  setting. 
Variations  of  the  latter  model  have  been  studied  in  [14],  [15]. 
The  sequential  search  problem  under  the  exclusive  model  was 
investigated  in  [16]-[19].  Optimal  policies  were  derived  for  the 
problem  of  quickest  search  over  Weiner  processes  [16]-[18]. 
It  was  shown  in  [16],  [17]  that  the  optimal  policy  is  to  select 
the  sequence  with  the  highest  posterior  probability  of  being  the 
target  at  each  given  time.  In  [18],  an  SPRT-based  solution  was 
derived,  which  is  equivalent  to  the  optimal  policy  in  the  case 
of  searching  over  Weiner  processes.  However,  minimizing  the 
total  expected  cost  in  our  model  leads  to  a  different  problem 
and  consequently  a  different  index  policy. 

The  classic  target  whereabouts  problem  is  also  a  detec¬ 
tion  problem  over  multiple  processes.  In  this  problem,  mul¬ 
tiple  locations  are  searched  to  locate  a  target.  The  problem  is 
often  considered  under  the  setting  of  fixed  sample  size  as  in 
[20]-[23].  In  [20]-[23],  searching  in  a  specific  location  pro¬ 
vides  a  binary-valued  measurement  regarding  the  presence  or 
absence  of  the  target.  In  [22],  Castanon  considered  the  dynamic 
search  problem  under  continuous  observations:  the  observa¬ 
tions  from  a  location  without  the  target  and  with  the  target  have 
distributions  /  and  g,  respectively.  The  optimal  policy  was  es¬ 
tablished  under  a  symmetry  assumption  that  f{x)  =  g{h  —  x) 
for  some  h. 

The  anomaly  detection  problem  may  also  be  considered  as 
a  variation  of  active  hypothesis  testing  in  which  the  decision 
maker  chooses  and  dynamically  changes  its  observation  model 
among  a  set  of  observation  options.  Classic  and  more  recent 
studies  of  general  active  hypothesis  testing  problems  can  be 
found  in  [24]-[30]. 

D.  Organization 

In  Section  II  we  describe  the  system  model  and  problem 
formulation.  In  Section  III  we  propose  a  two-stage  optimization 
problem  that  simplifies  computation  while  preserving  opti¬ 
mality.  In  Section  IV  we  derive  optimal  algorithms  under  the 
independent  and  exclusive  models  for  the  simple  hypothesis 
case.  In  Section  V  we  extend  our  results  to  the  composite 
hypothesis  case:  we  derive  asymptotically  optimal  algorithms 
under  the  independent  and  exclusive  models.  In  Section  VI  we 
provide  numerical  examples  to  illustrate  the  performance  of 
the  algorithms. 
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II.  System  Model  and  Problem  Formulation 

Consider  a  cyber  system  consisting  of  K  components,  where 
each  component  may  be  in  a  normal  state  or  abnormal  state. 
Define 

Ho  =  {k  :  1  <  k  <  K,  component  k  is  healthy}, 

Hi  =  {k  :  I  <  k  <  K,  component  k  is  abnormal},  (1) 

as  the  sets  of  the  normal  and  abnormal  components. 

We  consider  two  different  anomaly  models. 

1)  Exclusive  model:  One  and  only  one  component  is  ab¬ 

normal;  the  a  priori  probability  that  component  k  is  the 
abnormal  one  is  nk,  where  =  I. 

2)  Independent  model:  Each  component  k  is  abnormal  with  a 
priori  probability  tt^  independent  of  other  components. 

Every  abnormal  component  k  incurs  a  cost  (0  <  <  oc) 

per  unit  time  until  it  is  tested  and  identified.  Components  in  a 
normal  state  do  not  incur  cost.  We  focus  on  the  case  where  only 
one  component  can  be  probed  at  a  time.  The  resulting  probing 
strategies  apply  to  the  case  where  a  subset  of  the  components 
can  be  probed  simultaneously  and  their  performance  in  this  case 
are  studied  via  simulation  examples,  given  in  Section  VI.  When 
component  k  is  tested  at  time  t,  a  measurement  (or  a  vector 
of  measurements)  yk  {t)  is  obtained  and  is  independently  over 
time.  If  component  k  is  healthy,  yk^t)  follows  distribution 
if  component  k  is  abnormal,  yk(t)  follows  distribution  We 
focus  first  on  the  simple  hypothesis  case,  where  the  distribu¬ 
tions  are  known.  In  Section  V  we  extend  our  results 

to  the  composite  hypothesis  case,  where  the  distributions  have 
unknown  parameters.  We  consider  the  case  where  switching 
across  components  is  allowed  only  when  the  state  of  the  cur¬ 
rently  probed  component  is  declared. 

The  detection  process  starts  at  time  t  =  1.  The  random 
sample  size  required  to  make  a  decision  regarding  the  state  of 
component  k  is  denoted  by  Nk-  We  define  Tk  as  the  stopping 
time  (counted  from  the  beginning  of  the  first  test  at  t  =  I), 
at  which  the  decision  maker  stops  taking  observations  from 
component  k  and  declares  its  state.  The  vector  of  stopping 
times  for  the  K  components  is  denoted  by  r  =  (ri, . . . ,  Ti^). 
For  example,  assume  that  K  =  3  and  the  decision  maker  tests 
the  components  according  to  the  following  order:  3,  1,2.  Then, 
t3  =  No,  Ti  =  No  +  Ni,  t2  =  No  +  Ni  +  N2. 

Let  Sk  6  {0, 1}  be  a  decision  rule,  which  the  decision  maker 
uses  to  declare  the  state  of  component  k  at  time  .  6^  =  0  if  the 
decision  maker  declares  that  component  fc  is  in  a  healthy  state 
{i.e.,Ho),andSk  =  1  ifthe  decision  maker  declares  that  compo¬ 
nent  k  is  in  an  abnormal  state  (i.e..  Hi).  The  vector  of  decision 
rules  for  the  K  components  is  denoted  by  ^  =  {Si, . . .  ,Sk)- 

Let  IC{t)  be  the  set  of  components  whose  states  have  not 
been  declared  by  time  t  and  (l>{t)  the  index  of  the  com¬ 
ponent  being  tested  at  time  t  (i.e.,  a  selection  rule).  Let 
y{t)  =  the  set  of  all  observations  and 

actions  up  to  time  t.  The  selection  rule  (j>{t)  is  a  mapping  from 
y{t  —  1)  to  K,{t),  indicating  which  component  is  chosen  to 
be  tested  at  time  t  among  the  components  whose  states  have 
not  been  determined.  Since  switching  across  components  is 
allowed  only  when  the  state  of  the  currently  probed  component 
is  declared,  the  selection  rule  satisfies  —t)=  4>{Tk)  for  all 
1  <  f  —  1,  k  =  1,2, . . .  ,K  .  The  vector  of  selection 


rules  over  the  time  series  is  denoted  by  ^  =  (^i(l),  ^(2) . . .). 
An  admissible  strategy  s  is  a  sequence  of  K  sequential  tests  for 
the  K  components  and  denoted  by  the  tuple  s  =  (r,  8, 4>). 

The  problem  is  to  find  a  strategy  s  that  minimizes  the  total 
expected  cost,  incurred  by  all  the  abnormal  components  during 
the  entire  detection  process,  subject  to  type  I  (false-alarm)  and 
type  II  (miss-detect)  error  constraints  for  each  component: 


s.t.  <  Gfc,  Pf ^  <  /3fc  yk  =  l,...,K,  (2) 

We  point  out  that  the  total  cost  defined  in  (2)  does  not  include 
the  cost  incurred  by  miss-detected  abnormal  components.  Since 
the  error  constraints  are  typically  required  to  be  small,  (2)  well 
approximates  the  actual  loss  in  practice. 

We  have  adopted  a  model  where  switching  across  compo¬ 
nents  is  allowed  only  when  the  test  of  a  currently  chosen  com¬ 
ponent  is  complete.  This  model  is  desirable  in  practical  sce¬ 
narios  when  switching  among  components  results  in  additional 
cost  or  delay.  This  model  also  reduces  the  memory  requirement 
since  only  observations  from  a  single  component  need  to  be 
stored.  Furthermore,  this  model  is  advantageous  from  a  compu¬ 
tational  complexity  perspective.  Detection  problems  involving 
multiple  processes  are  partially-observed  Markov  decision  pro¬ 
cesses  (POMDP)  [22]  which  have  exponential  complexity  in 
general.  As  a  result,  computing  optimal  policies  is  intractable 
(except  for  some  special  observation  distributions  as  considered 
in  [16],  [22]).  Thus,  simplifying  the  search  model  is  necessary 
to  make  the  problem  mathematically  tractable  and  provide  in¬ 
sights  and  general  design  guidelines.  Similar  assumptions  have 
been  adopted  in  [13],  [18],  [19]  to  simplify  the  search  model 
under  different  objectives. 

III.  Decoupling  of  Ordering  and  Sequential  Testing 

In  this  section,  we  show  that  the  probing  order  and  the  se¬ 
quential  testing  of  each  component  can  be  decoupled.  As  a  con¬ 
sequence,  the  solution  to  (2)  can  be  obtained  in  two  stages. 

In  the  first  stage,  we  solve  the  following  optimization  problem 
for  every  component  k: 

inf  E{Nk\Hi),  i  =  0,1 
Nk 

^  <  Pk.  (3) 

In  the  second  stage,  the  problem  is  to  find  a  selection  rule 
that  minimizes  the  objective  function,  given  the  solution  to  the 
K  subproblems  specified  in  (3): 

Cfcrfc|(iV*,r)|  (4) 

where 

N*  =  {N:,...,N*^),6*  =  {Sl,...,S*i,)  (5) 

denote  the  vectors  of  stopping  times  and  decision  rules,  respec¬ 
tively,  that  solve  the  K  subproblems  given  in  (3).  Note  that  the 
stopping  times  r  =  (ti,  . . . ,  tj^)  are  completely  specified  by 
N*  and  the  selection  rule  <{>*  that  solves  (4). 
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The  formulation  of  the  two-stage  optimization  problem  al¬ 
lows  us  to  decompose  the  original  optimization  problem  (2)  into 
K-\-l  subproblems  (3)  and  (4).  In  subsequent  sections  we  show 
that  the  two-stage  optimization  problem  preserves  optimality 
under  both  the  independent  and  exclusive  models. 


IV.  The  Simple  Hypothesis  Case 

In  this  section  we  derive  optimal  solutions  to  both  the  inde¬ 
pendent  and  exclusive  models  when  the  observation  distribu¬ 
tions  under  both  hypotheses  are  completely  known.  We  discuss 
the  implementation  of  the  optimal  policies  in  Section  IV-C. 


A.  SPRT for  Each  Component 

For  the  simple  hypothesis  case,  the  solution  to  the  first  stage 
optimization  problem  (3)  is  given  by  the  SPRT  [2]  as  follows. 
Assume  that  the  state  of  component  j  has  been  declared  at  time 
Tj  and  component  k  is  chosen  to  be  tested  at  time  Tj  +  1.  Let 


Lk{n) 


(6) 


be  the  Likelihood  Ratio  (LR)  between  the  two  hypotheses  for 
component  k  at  stage  n. 

In  SPRT,  the  stopping  and  decision  rules  are  given  by  com¬ 
paring  the  LR  with  boundary  values  at  each  stage  n  [2].  Specif¬ 
ically,  let  Ak,  Bk  {Bk  >  1/Afe)  be  the  boundary  values  used  by 
the  SPRT  for  component  k.  The  SPRT  algorithm  is  carried  out 
as  follows: 

•  If  Lk{n)  G  Hfc),  continue  to  take  observations 

from  component  k. 

*  If  Lk{n)  >  Bk,  stop  taking  observations  from  component 
k  and  declare  it  as  abnormal  (i.e.,  6k  =  1).  Clearly,  Nk  = 


•  If  Lk{n)  <  {Ak)  ^ ,  stop  taking  observations  from  com¬ 
ponent  k  and  declare  it  as  normal  (i.e.,  Sk  =  0).  Clearly, 

Nk  =  n. 

Implementation  of  the  SPRT  requires  the  computation  of  Ak 
and  Bk  to  ensure  the  constraints  on  the  error  probabilities.  In 
general,  the  exact  determination  of  the  boundary  values  is  la¬ 
borious  and  depends  on  the  observation  distribution.  Wald’s  ap¬ 
proximation  can  be  applied  to  simplify  the  computation  [2] : 


Bk 


f  —  Pk 


1  - 
~lk 


(7) 


Wald’s  approximation  performs  well  for  small  ak,Pk  and  is 
asymptotically  optimal  as  ak,Pk  approach  zero.  Since  type  I 
and  type  II  errors  are  typically  required  to  be  small,  Wald’s  ap¬ 
proximation  is  widely  used  in  practice  [2]. 


B.  Optimal  Index  Policies 

We  now  consider  the  second  stage  optimization  problem 
specified  by  (4)  and  (5).  Our  main  result  is  to  establish  the  op¬ 
timal  selection  rule  as  the  TrcA^-rule  for  the  independent  model 
and  the  ttcNq  rule  for  the  exclusive  model.  Specifically,  the 
TTcIV-rule  dictates  that  the  components  be  tested  in  a  decreasing 
order  of  Tr^Cfc /E(Afc)  and  the  7rciVo-rule  dictates  that  the  com¬ 
ponents  be  tested  in  a  decreasing  order  of  7rfcCfc/E(A^fc|7To)- 
Note  that  these  optimal  selection  rules  are  open  loop  policies: 


the  testing  orders  can  be  determined  offline  (see  Section  IV-C 
for  the  computation  of  the  indices).  With  the  optimal  solution 
to  (3),  the  optimal  anomaly  detection  strategy  is  to  carry  out  a 
series  of  SPRTs  on  the  components  ordered  according  to  either 
the  TTciV -rule  or  the  TrcIVo-rule.  The  resulting  strategies  are  thus 
referred  to  as  ttcTV-SPRT  and  ttcAq-SPRT. 

The  index  selection  rules  ttcA  and  ttcAq  are  intuitively  sat¬ 
isfying.  The  priority  of  component  k  in  terms  of  testing  order 
should  be  higher  as  the  cost  Ck  increases,  or  the  a  priori  proba¬ 
bility  of  being  abnormal  increases.  Under  the  independent 
model,  the  priority  of  component  k  in  terms  of  testing  order 
should  be  higher  as  the  expected  sample  size  E(7Vfc)  decreases 
(since  E(  contributes  to  the  cost  of  every  component  which 
is  tested  after  component  k).  On  the  other  hand,  under  the  exclu¬ 
sive  model,  the  priority  of  component  k  in  terms  of  testing  order 
depends  on  E(Afc  |iLo)  rather  than  E(Afc).  The  reason  is  that  if 
component  k  is  abnormal,  there  is  no  additional  cost,  incurred 
by  other  components  (since  only  one  component  is  abnormal). 
On  the  other  hand,  if  component  k  is  healthy,  then  E(Afc|iTo) 
contributes  to  the  cost  of  the  components  (which  may  be  ab¬ 
normal)  tested  after  component  k. 

The  optimality  of  the  algorithms  is  shown  in  the  following 
theorem. 

Theorem  1:  Under  the  independent  and  exclusive  models,  the 
ttcA-SPRT  and  ttcAq-SPRT  algorithms,  respectively,  solve  the 
original  optimization  problem  (2). 

Proof:  See  Appendices  VIII-A  and  VIII-B.  ■ 

While  TTcA-rule  and  yrcAo-rule  are  open-loop  policies.  The¬ 
orem  1  shows  that  they  are  optimal  among  the  class  of  both 
open-loop  and  closed-loop  selection  rules.  It  should  be  noted 
that  open-loop  policies  may  not  preserve  optimality  under  non¬ 
linear  cost  functions  or  other  correlated  models.  In  these  cases, 
the  optimal  testing  order  might  need  to  be  updated  dynamically 
based  on  the  realizations  of  each  individual  test  in  terms  of  the 
test  outcome  or  the  detection  time. 

The  TTcA-rule  and  7rcAo-rule  bear  some  similarity  with  the 
result  developed  in  [31].  In  [31],  the  problem  of  ordering  inde¬ 
pendent  operations  with  given  processing  times  was  considered. 
It  was  shown  that  the  optimal  selection  rule  for  the  problem  of 
minimizing  an  expected  weighted  sum  of  completion  times  of 
all  the  operations  is  to  select  the  components  in  decreasing  order 
of  Cfc/E(Afc),  where  Ck  and  E(Afc)  are  the  weight  and  the  ex¬ 
pected  processing  time  for  operation  k,  respectively.  However, 
the  problem  in  (4)  is  different.  First,  each  component  may  be 
normal  or  abnormal  (rather  than  a  given  processing  time  with 
a  fixed  distribution)  and  the  expected  sample  size  depends  on 
the  component  state.  Second,  the  objective  is  to  minimize  an 
expected  weighted  sum  of  stopping  times  of  abnormal  compo¬ 
nents  only.  Third,  under  the  exclusive  model,  the  states  of  the 
K  components  are  dependent.  Furthermore,  the  original  opti¬ 
mization  (2)  also  includes  the  stopping  rules  which  control  the 
expected  sample  size. 

C.  Computing  the  Indices 

Arranging  the  components  according  to  7rcA-rule  or 
TTcAo-rule  can  be  done  in  0{K\og,K)  time  via  sorting  al¬ 
gorithms.  However,  computing  the  expected  sample  size 
E(Afc|Ffi)  for  all  A;  =  1,  2, . . . ,  AT  can  be  involved.  In  general, 
it  is  difficult  to  obtain  a  closed-form  expression  for  E(Afc|iLi). 
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One  way  to  evaluate  Fi(Nk\Hi)  is  to  perform  off-line  sim¬ 
ulations  (i.e.,  carrying  out  K  independent  SPRTs  for  the  K 
components).  Another  way  to  evaluate  is  to  use  a 

closed-form  approximation  as  follows.  Since  the  solution  to  (3) 
is  given  by  the  SPRT,  Wald’s  approximation  can  be  applied  [2]. 
For  every  i,j  =  0,1,  let 

Dfc(*||j)  =  E,  (8) 

V  /r(?yfe(i))y 


be  the  Kullback-Leibler  (KL)  divergence  between  the  hy¬ 
potheses  Hi  and  Hj,  where  the  expectation  is  taken  with 
respect  to  . 

The  expected  sample  size  conditioned  on  each  hypothesis  is 
well  approximated  by  [2] : 


E{Nk\Ho) 


E(yVfc|ffi) 


(1  -  Qfc)logAfc  -  ak  log Bk 

f2fc(0||l) 

(1  -  /3fc)  log  jjfc  -  (3k  log  Ak 

Dkim 


(9) 


where  Ak  =  {I  -  Q.k)/Pk,Bk  =  (1  -  l3k)/ak  are  the  ap¬ 
proximations  to  Ak,Bk,  given  in  (7).  Note  that  (9)  approach  the 
exact  expected  sample  sizes  E(A^fc|ifo)  ^  —  log/3fc/i9fc(0||l), 
F,{Nk\Hi)  —  logafc/i9fc(l||0)  as  the  error  constraints  ap¬ 
proach  zero. 

The  expected  sample  size  required  to  make  a  decision  re¬ 
garding  the  state  of  component  k  is  given  by: 


E{Nk)  =  7rkE{Nk\Hk)  +  (1  -  TVkMNk\Ho),  (10) 


Let  be  a  vector  of  unknown  parameters  of  component  k. 
The  observations  {2/fc(*)}i>i  are  drawn  from  a  common  distri¬ 
bution  J{y\6k),  Ok  e  0fc,  where  0fc  is  the  parameter  space  of 
component  k.  If  component  k  is  healthy,  then  Ok  G  if 

component  k  is  abnormal,  then  6)^  G  (0  \  0^°^).  Let  Q^k  \  0^^^ 
be  disjoint  subsets  of  0^,  where  Ifc  =  0  \  (0^°^  U  ^  0 
is  an  indifference  region.'  When  Ok  ^  Ik,  the  detector  is  indif¬ 
ferent  regarding  the  state  of  component  k.  Hence,  there  are  no 
constraints  on  the  error  probabilities  for  all  G  Ik-  The  hy¬ 
pothesis  test  regarding  component  k  is  to  test 

Ok  G  0^°^  against  Ok  G 

Narrowing  Ik  has  the  price  of  increasing  the  sample  size. 

Ok{n)  =  arg  max  f  {yk{n)\0k) , 

0k\ri,)  =  arg  max  f  {y kin) \0k) ,  (11) 

be  the  Maximum-Likelihood  Estimates  (MLEs)  of  the  parame- 
ters  over  the  parameter  spaces  0fc,  0^  at  stage  n,  respectively. 

In  contrast  to  the  SPRT  (for  the  simple  hypothesis  case),  the 
theory  of  sequential  tests  of  composite  hypotheses  does  not  pro¬ 
vide  optimal  performance  in  terms  of  minimizing  the  expected 
sample  size  under  given  error  constraints.  Nevertheless,  asymp¬ 
totically  optimal  performance  can  be  obtained  as  the  error  prob¬ 
ability  approaches  zero. 

First,  we  provide  an  overview  of  existing  sequential  tests  for 
composite  hypotheses  which  are  relevant  to  our  problem.  Next, 
we  apply  these  techniques  to  solve  (2). 


where  the  approximation  approaches  the  exact  expected  sample 
size  for  small  ak,Pk- 

Note  that  optimality  of  the  algorithms  is  preserved  as  long  as 
the  order  of  the  indices  is  preserved  (i.e.,  the  exact  index  values 
are  not  required  for  optimality).  Therefore,  optimality  can  be 
achieved  in  practice  even  when  Wald’s  approximation  is  used. 

V.  The  Composite  Hypothesis  Case 

In  the  previous  section  we  focused  on  the  simple  hypothesis 
case,  where  the  distribution  under  both  hypotheses  are  com¬ 
pletely  known.  For  this  case,  the  SPRT  was  applied  to  solve  (3). 
However,  in  numerous  cases  there  is  uncertainty  in  the  observa¬ 
tion  distributions. 

For  example.  Consider  a  one-parameter  distribution  fiy\6k), 
where  it  is  required  to  test  Ok  <  0^'^  against  Ok  >  0^^'^  >  o'f^ 
As  discussed  in  [2],  the  SPRT  can  be  applied  to  this  problem  by 
testing  Ok  =  against  Ok  =  ol^\  where  the  boundary  values 
are  set  such  that  the  error  constraints  are  satisfied  at  0^^\  0^f}\ 
For  some  important  cases,  such  as  an  exponential  family  of  dis¬ 
tributions,  this  sequential  test  has  the  property  that  type  /  and 
type  II  errors  are  less  than  ak,  Pk  for  all  Ok  <  o'f^  and  Ok  > 
6*^^^ ,  respectively.  However,  while  the  SPRT  minimizes  the  ex¬ 
pected  sample  size  at  Ok  =  0^j^\  it  is  highly  sub-optimal 

for  other  values  of  6*,  as  demonstrated  in  Section  VI.  Therefore, 
other  techniques  should  be  considered  under  the  composite  hy¬ 
pothesis  case. 


A.  Existing  Sequential  Tests  for  Composite  Hypothesis  Testing 

The  key  idea  is  to  use  the  estimated  parameters  to  perform  a 
one-sided  sequential  test  to  reject  Hq  and  a  one-sided  sequential 
test  to  reject  Hi .  Note  that  these  techniques  were  introduced  for 
a  single  process.  However,  in  this  paper  we  apply  sequential 
tests  for  K  components.  Thus,  we  use  the  subscript  k  to  denote 
the  component  index. 

1)  Sequential  Generalized  Likelihood  Ratio  Test  (SGLRT): 
We  refer  to  sequential  tests  that  use  the  Generalized  Likelihood 
Ratio  (GLR)  statistics  as  the  SGLRT. 

For  2  =  0,  1,  let 


j{i),GLR 


in)  =  log 


irr=if[y>^{r)\hin)) 

KUf(ykir)\0fin)^ 


(12) 


be  the  GLR  statistics  used  to  reject  hypothesis  Hi  at  stage  n.  Let 

=  inf  {n  :  L^^^’^^^in)  >  |  ,  (13) 

be  the  stopping  rule  used  to  reject  hypothesis  H^.  Bj^'  is  the 
boundary  value. 

'The  assumption  of  an  indifference  region  is  widely  used  in  the  theory  of  se- 
quential  testing  of  composite  hypotheses  to  derive  asymptotically  optimal  per¬ 
formance.  Nevertheless,  in  some  cases  this  assumption  can  be  removed.  For 
more  details,  the  reader  is  referred  to  [5]. 
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For  each  component  k,  the  decision  maker  stops  the  sampling 
when  Nk  =  } .  If  Nk  =  Nj^^\  component  k 

is  declared  as  abnormal  (i.e.,  Hq  is  rejected).  If  Nk  =  Nj^^\ 
component  k  is  declared  as  normal  (i.e.,  Hq  is  accepted). 

The  SGLRT  was  first  studied  by  Schwartz  [3]  for  a  one-pa¬ 
rameter  exponential  family,  who  assigned  a  cost  of  c  for 
each  observation  and  a  loss  function  for  wrong  decisions. 
It  was  shown  that  setting  =  log(c^^)  asymptotically 

minimizes  the  Bayes  risk  as  c  approaches  zero.  A  refinement 
was  studied  by  Lai  [5],  [7],  who  set  a  time-varying  boundary 
value  ~  log((r?,c)^^).  Lai  showed  that  for  a  multivariate 
exponential  family  this  scheme  asymptotically  minimizes  both 
the  Bayes  risk  and  the  expected  sample  size  subject  to  error 
constraints  as  c  approaches  zero  [7]. 

2)  Sequential  Adaptive  Likelihood  Ratio  Test  (SALRT):  We 
refer  to  sequential  tests  that  use  the  Adaptive  Likelihood  Ratio 
(ALR)  statistics  as  the  SALRT. 

For  i  =  0,  1,  let 


j(i),ALR 


{n)  =  log 


KUf{ykir)\hir-l)) 

nr=i  / 


(14) 


be  the  ALR  statistics  used  to  reject  hypothesis  Hi  at  stage  n.  Let 
=  inf  {u,  :  ,  (15) 

be  the  stopping  rule  used  to  reject  hypothesis  Hi,  where  is 
the  boundary  value. 

For  each  component  k,  the  decision  maker  stops  the  sampling 
when  Nk  =  mm{Nl^\^N^^^}.  If  Nk  =  component  k  is 

declared  as  abnormal.  If  Nk  =  Nj,^\  component  k  is  declared 
as  normal. 

The  SALRT  was  first  introduced  by  Robbins  and  Siegmund 
[4]  to  design  power-one  sequential  tests.  Pavlov  used  it  to  de¬ 
sign  asymptotically  (as  the  error  probability  approaches  zero) 
optimal  (in  terms  of  minimizing  the  expected  sample  size  sub¬ 
ject  to  error  constraints)  tests  for  composite  hypothesis  testing 
of  the  multivariate  exponential  family  [6].  Tartakovsky  estab¬ 
lished  asymptotically  optimal  performance  for  a  more  general 
multivariate  family  of  distributions  [8]. 

The  advantage  of  using  the  SALRT  is  that  setting  = 
log  =  log  ^  satisfies  the  error  probability  constraints 

in  (3).  However,  such  a  simple  setting  cannot  be  applied  to 
the  SGLRT.  Thus,  implementing  the  SALRT  is  much  simpler 
than  implementing  the  SGLRT.  The  disadvantage  of  using  the 
SALRT  is  that  poor  early  estimates  (for  small  number  of  ob¬ 
servations)  can  never  be  revised  even  though  one  has  a  large 
number  of  observations. 


B.  Asymptotically  Optimal  Index  Policies 

It  is  intuitive  that  the  selection  rules  in  the  composite  hypoth¬ 
esis  case  remain  the  same  as  in  the  simple  hypothesis  case.  The 
resulting  strategies  are  thus  referred  to  as  -kcN -SGLRT/SALRT 
and  ttcAq-SGLRT/SALRT  algorithms.  In  the  following 
theorems,  we  show  that  the  ttcA-SGLRT/SALRT  and 
ttcAq-SGLRT/SALRT  algorithms  are  asymptotically  optimal 
in  terms  of  minimizing  the  objective  function  subject  to  the  error 


constraints  (2)  as  the  error  probabilities  approach  zero.^  When 
deriving  asymptotics  we  assume  that  Pk^  0,  P^^  0 

for  all  k  such  that  the  asymptotic  optimality  property  in  terms 
of  minimizing  the  expected  sample  size  subject  to  the  error 
constraints  holds  for  each  single  process  for  both  SGLRT  and 
SALRT,  as  discussed  in  Section  V-A. 

Theorem  2:  Consider  the  independent  model  under  the 
composite  hypothesis  case.  Let  ,  be 

the  optimal  solution  to  (2).  Let  be  the  solution 

achieved  by  the  ttcA-SGLRT/SALRT  algorithm.  Then,  as 
P^"^  0;  P^^  0  for  all  k,  we  obtain: 

^1  E  CkTk\{T*X,4>*) 

UeWi 

(16) 

IfcGWi  J 


Proof:  See  Appendix  VIII-C.  ■ 

Theorem  3:  Consider  the  exclusive  model  under  the 
composite  hypothesis  case.  Let  be 

the  optimal  solution  to  (2).  Let  fr*  ,S*  ,4>*)  be  the  solution 
achieved  by  the  ttcAq-SGLRT/SALRT  algorithm.  Then,  as 
P^^  — >  0,  P^^  — >  0  for  all  k,  we  obtain: 

E  CkTk\{Tf8*,,t>*) 

UeWi 

(17) 

UgHi  j 

Proof:  See  Appendix  VIII-D.  ■ 


C.  Computing  the  Indices 

Arranging  the  components  in  decreasing  order  of 
7rfcCfc/E(Afc)  or  7rfcCfc/E(Afc|i7o)  requires  one  to  compute 
the  expected  sample  size  'E{Nk\Hf  for  all  fc  =  1,  2. . . . .  iL.  In 
general,  it  is  difficult  to  obtain  a  closed-form  expression  for  the 
exact  value  of  Ei{Nk\Hi).  However,  we  can  use  the  asymptotic 
property  of  the  tests  to  obtain  a  closed-form  approximation  of 
E(Afc|iJi),  which  approaches  the  exact  expected  sample  size 
as  the  error  probability  approaches  zero. 

For  every  i  =  0,1,  let 

=  ,18) 

be  the  KL  divergence  between  the  real  value  of  ^ ^  and  A,  where 
the  expectation  is  taken  with  respect  to  f{y\6k),  and  let 

^^^DkiekWX).  (19) 


Let  Ik^\ll^'^  be  disjoint  subsets  of  Ik  and  Ik  =  U 
such  that  for  all  (9^  e  ij^^  we  have  b[^^ / 01(6 k\\el^^ )  < 
■Sfc  V-C’fc(^fc||0fc^)  fori,  j  =  0,  1.  Let  P^^\0k)  be  a  prior  dis¬ 
tribution  on  6k  over  U  (corresponding  to  Hi).  Then,  as 

^As  shown  in  the  proof  of  Theorems  2,  3,  the  index  policies  are  still  optimal 
in  terms  of  testing  order.  The  asymptotic  optimality  is  due  to  the  performance 
of  the  sequential  test  under  the  composite  hypothesis  case. 
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^  0,  ^  0,  the  conditional  expected  sample  size  is 

given  by  [6],  [7]: 


E(7V,|ffo)  ~  I 

HNk\Hi)  ~  J 


Dl 


dP^°\ek), 


Dl 


dP^^\0k)- 


(20) 


The  expected  sample  size  required  to  make  a  decision  regarding 
the  state  of  component  k  is  given  by: 


E(iVfc)  =  ^fcE(Affc|7Ti)  +  (1  -  7VkMNk\Ho),  (21) 


which  can  be  well  approximated  for  small  error  probability 
using  (20). 


VI.  Numerical  Examples 

In  this  section  we  present  numerical  examples  to  illustrate 
the  performance  of  the  algorithms.  Consider  a  cyber  network 
consisting  of  K  components  (which  can  be  routers,  paths,  etc.), 
as  discussed  in  Section  I-B.  Assume  that  an  intruder  tries  to 
launch  a  DoS  or  Reduction  of  Quality  (RoQ)  attacks  by  sending 
a  large  number  of  packets  to  a  component.  RoQ  attacks  inflict 
damage  on  the  component,  while  keeping  a  low  profile  to  avoid 
detection.  RoQ  attacks  do  not  cause  denial  of  service. 

To  detect  such  attacks,  the  IDS  performs  a  tralfic-based 
anomaly  detection.  It  monitors  the  traffic  at  each  component 
to  decide  whether  a  component  is  compromised.  Roughly 
speaking,  if  the  actual  arrival  rate  is  significantly  higher  than 
the  arrival  rate  under  the  normal  state,  then  the  IDS  should 
declare  that  the  component  is  in  an  abnormal  state.  A  similar 
traffic -based  detection  technique  was  proposed  in  [32]  for  a 
different  model,  considering  a  single  process  without  switching 
to  other  components.  For  each  component  k,  we  assume  that 
packets  arrive  according  to  a  Poisson  process  with  rate 
When  component  k  is  tested,  the  IDS  collects  an  observation 
yk{n}  G  No  every  time  unit,  which  represents  the  number  of 
packets  that  arrived  in  the  interval  (n  —  1,  n).  Assume  that  the 
IDS  considers  component  k  as  normal  if  Ok  <  0^°\  and  tests 
9k  <  against  9k  >  9[^^  (i.e.,  h  =  {9k\9^k^  <  9k  < 
is  the  indifference  region).  We  set  Ck  =  9^\  As  discussed 
in  Section  I-B,  under  this  setting  the  optimization  problem 
minimizes  the  maximal  damage  to  the  network  in  terms  of 
packet-loss. 

A.  Simple  Hypothesis  Case 

We  consider  the  case  where  the  observations  follow  Poisson 
distributions  yk{n)  Poi(^i°^)  or  yk{n)  ~  Poi(6l^^^)  de¬ 
pending  on  wether  component  k  is  healthy  or  abnormal,  respec¬ 
tively,  where  ,  9^}'^  are  known  to  the  IDS.  To  implement  the 
ttcA^-SPRT  and  ttcA^q-SPRT  algorithms  (which  are  optimal  in 
this  scenario  for  the  independent  and  exclusive  models,  respec¬ 
tively),  we  need  to  compute  the  LR  between  the  hypotheses,  de¬ 
fined  in  (6),  and  the  expected  sample  sizes  under  the  hypotheses, 
which  can  be  well  approximated  by  (9).  Let  Ak  (n)  =  log  Lk  (n) 
be  the  Log-Likelihood  Ratio  (LLR)  between  the  two  hypotheses 
of  component  k  at  stage  n,  where  Lk{n)  is  defined  in  (6).  After 


algebraic  manipulations,  it  can  be  verified  that  the  LLR  is  given 
by: 

Afc(n)  =  -n  +  log  ^  Z! 

\0k  J 

It  can  be  verified  that  the  KL  divergence  between  the  hypotheses 
Hi  and  Hj,  defined  in  (8),  is  given  by: 

Dkm  =  0<;^^-9<j:^+9^^hog(^^y  (23) 

Substituting  (23)  in  (9)  yields  the  required  approximation  to  the 
expected  sample  size.  We  note  that  the  optimal  indices  order  was 
preserved  using  the  approximation  in  (9)  under  all  numerical 
examples  in  this  section. 

Next,  we  provide  numerical  examples  to  illustrate  the  per¬ 
formance  of  the  algorithms.  We  compared  three  schemes:  a 
Random  selection  SPRT  (R-SPRT),  where  a  series  of  SPRTs  are 
performed  until  all  the  components  are  tested  in  a  random  order 
(which  is  optimal  for  the  problem  of  minimizing  the  detection 
delay  over  independent  processes  [12]),  and  the  proposed 
7rcAf-SPRT  and  TrcA^o-SPRT  algorithms,  which  are  optimal 
under  the  independent  and  exclusive  models,  respectively. 

Let  Ak  =  (100  -  10)/(Ar  -  1).  We  set  =  6*^  = 
lO-f  (fc—  l)Ax  (i.e.,  the  costs  are  equally  spaced  in  the  interval 
[10,  100])  and  9^}^  =  1.5  •  .  The  error  constraints  were  set  to 

P^^  =  10~^ ,  P^ ^  =  10“®  for  all  k.  For  the  independent  and 
exclusive  models,  we  set  Tik  =  0.8  and  Tik  =  l/K  for  all  k,  re¬ 
spectively.  The  performance  of  the  7rcfV-SPRT  and  ttcTVq-SPRT 
algorithms  are  presented  in  Fig.  1(a)  and  1(b)  under  the  inde¬ 
pendent  and  exclusive  models,  respectively,  and  compared  to 
the  R-SPRT.  It  can  be  seen  that  the  proposed  algorithms  save 
roughly  50%  of  the  objective  value  as  compared  to  the  R-SPRT 
under  both  the  independent  and  exclusive  model  scenarios. 

Next,  we  simulate  the  independent  model  when  2  components 
are  observed  at  a  time  and  the  total  number  of  components  is 
K  =  9).  Note  that  in  this  case  the  ttcTV-SPRT  algorithm  may 
not  be  optimal.  We  use  an  exhaustive  search  as  a  bench  mark 
to  demonstrate  the  performance  of  the  ttcA-SPRT  algorithm 
in  this  scenario.  The  exhaustive  search  is  done  by  performing 
a  sequence  of  K  SPRTs  among  all  the  possible  testing  orders. 
Then,  the  minimal  objective  value  is  chosen  as  a  bench  mark. 
We  set  the  maximal  cost  to  Ctoox  =  100  and  the  costs  are  equally 
spaced  in  the  interval  [cmin ,  100] .  The  error  constraints  were  set 
to  P^^  =  P^^  =  10^^  for  all  k.  The  performance  gain  of  the 
exhaustive  search  scheme  over  the  ttcA-SPRT  algorithm  as  a 
function  of  Cmin  are  presented  in  Fig.  2.  It  can  be  seen  that  the 
ttcA-SPRT  algorithm  almost  achieves  the  performance  of  the 
exhaustive  search  scheme  in  this  scenario  for  all  c„,in  •  For  small 
Cmin  both  algorithms  perform  the  same,  since  the  difference  be¬ 
tween  the  indices  increases.  The  exhaustive  search  outperforms 
the  TTciV-SPRT  algorithm  for  Cmin  >97,  but  the  gain  remains 
very  small. 

B.  Composite  Hypothesis  Case 

We  consider  the  case  of  composite  hypotheses,  where  there 
is  uncertainty  in  the  distribution  parameters,  as  discussed 
in  Section  V.  To  implement  the  asymptotically  optimal  the 
7rcAf-SGLRT/SALRT  and  ttcAq-SGLRT/SALRT  algorithms. 


COHEN  et  al:  OPTIMAL  INDEX  POLICIES  FOR  ANOMALY  LOCALIZATION  IN  RESOURCE-CONSTRAINED  CYBER  SYSTEMS 


4231 


Fig.  I.  Objective  value  as  a  function  of  the  number  of  components  under  the 
independent  and  exclusive  models,  (a)  An  independent  model  scenario,  (b)  An 
exclusive  model  scenario. 


we  need  to  compute  the  GLR  or  ALR  statistics,  defined 
in  (12),  (14)  and  the  expected  sample  sizes  under  the  hy¬ 
potheses,  which  can  be  well  approximated  by  (20).  The  MLEs 
of  the  parameters  over  the  parameter  spaces  0^,,  are 
given  by  the  sample  mean  and  the  boundary  of  the  alterna¬ 
tive  parameter  space,  respectively.  As  a  result,  substituting: 

in  (12),  (14)  yields 

the  GLR  and  ALR  statistics,  respectively.  The  KL  divergence 
between  the  real  value  of  9k  and  the  parameter  space  0^*^  is 
given  by: 

Dl  =  0^:^  -9k  +  0k  log  j  .  (24) 

Substituting  (24)  in  (20)  yields  the  approximate  expected 
sample  size. 

Next,  we  provide  numerical  examples  to  illustrate  the  per¬ 
formance  of  the  algorithms  under  uncertainty.  We  simulated 
a  network  with  homogenous  components  (i.e.,  any  selection 


Fig.  2.  Performance  gain  of  an  exhaustive  search  over  the  ttcNSPKT  algo¬ 
rithm  as  a  function  of  Cmin  under  the  independent  model. 


Fig.  3.  Average  number  of  observations  as  a  function  of  the  arrival  rate  of 
packets  (denoted  by  ^). 


rule  is  optimal).  We  compared  three  schemes:  R-SPRT,  and 
the  ttcV-SGLRT/SALRT  or  ttcVq-SGLRT/SALRT  algorithms 
(which  achieve  the  same  performance  in  this  case)  using  the 
SALRT  and  the  SGLRT,  discussed  in  Section  V-A.  We  set 
6/^°^  =  19,  6*^^^  =  21.  Under  uncertainty,  the  IDS  considers 
component  A:  as  normal  if  <  0©^,  and  tests  <  9^^  against 
(be., /fc  =  {@fc|19  <  9k  <  21}  is  the  indifference 
region).  To  implement  the  SGLRT,  we  set  the  cost  per  observa¬ 
tion  c  =  10“^.  According  to  the  assigned  cost,  we  obtained  the 
following  error  probability  constraints  for  all  k:  <  0.026 

for  all  <  19  and  <  0.03  for  all  9'^^'^  >  21.  We  do 

not  restrict  the  detector’s  performance  for  19  <  <  21 

(Note  that  narrowing  the  indifference  region  has  the  price  of 
increasing  the  required  sample  size).  In  Fig.  3  we  show  the 
average  number  of  observations  (in  a  log  scale)  required  for 
the  anomaly  detection  as  a  function  of  As  expected,  for 
9k  =  19  and  9k  =  21  the  R-SPRT  requires  lower  sample  size 
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as  compared  to  the  proposed  schemes.  On  the  other  hand,  it  can 
be  seen  that  for  most  values  of  6  the  SGLRT  and  the  SALRT 
require  lower  sample  size  as  compared  to  the  R-SPRT.  The 
SALRT  performs  the  worst  for  18  <  <  22,  and  performs 

the  best  for  6k  ^  (18,  22),  roughly.  The  SGLRT  obtains  the 
best  average  performance.  It  can  be  seen  that  for  large  values 
of  Ok  the  anomaly  is  detected  very  quickly,  since  the  distance 
between  the  hypotheses  increases.  This  result  confirms  that 
DoS  attacks  are  much  easier  to  detect  than  RoQ  attacks. 

Vll.  Conclusion 

The  problem  of  anomaly  localization  in  a  resource-con¬ 
strained  cyber  system  was  studied.  Due  to  resource  constraints, 
only  one  component  can  be  probed  at  a  time.  The  observa¬ 
tions  are  realizations  drawn  from  two  different  distributions 
depending  on  whether  the  component  is  normal  or  anomalous. 
An  abnormal  component  incurs  a  cost  per  unit  time  until  it 
is  tested  and  identified.  The  problem  was  formulated  as  a 
constrained  optimization  problem.  The  objective  is  to  minimize 
the  total  expected  cost  subject  to  error  probability  constraints. 
We  considered  two  different  anomaly  models:  the  independent 
model  in  which  each  component  can  be  abnormal  independent 
of  other  components,  and  the  exclusive  model  in  which  there  is 
one  and  only  one  abnormal  component.  For  the  simple  hypoth¬ 
esis  case,  we  derived  optimal  algorithms  for  both  independent 
and  exclusive  models.  For  the  composite  hypothesis  case,  we 
derived  asymptotically  (as  the  error  probability  approaches 
zero)  optimal  algorithms  for  both  independent  and  exclusive 
models.  These  optimal  algorithms  have  low-complexity. 

The  algorithms  developed  in  this  paper  can  be  applied  to  other 
models  of  anomaly  detection  as  well.  We  can  modify  the  pro¬ 
posed  algorithms  to  any  detection  scheme  that  performs  a  se¬ 
ries  of  tests  according  to  the  ncN -mle  or  TrcAo-rule.  The  re¬ 
quired  modification  is  to  replace  the  SPRT/SALRT/SGLRT  by 
any  given  test.  Such  modified  algorithms  minimize  the  objec¬ 
tive  function  among  all  the  algorithms  that  perform  the  given 
test. 

Deriving  optimal  policies  for  the  anomaly  localization 
problem  considered  in  this  paper  requires  the  assumption  that 
switching  to  a  different  component  is  allowed  only  when  the 
state  of  the  currently  probed  component  is  declared.  A  future 
research  direction  is  to  examine  the  anomaly  localization 
problem  under  the  case  where  switching  to  a  different  compo¬ 
nent  and  declarations  of  the  states  of  individual  components 
are  allowed  at  all  times. 


Appendix 

In  this  appendix  we  provide  the  proofs  for  Theorems  1-3.  For 
convenience,  we  use  the  superscripts  Al,  A2  when  referring  to 
the  ttcA-SPRT  and  ttcWq-SPRT  algorithms,  respectively.  We 
use  the  superscripts  A3,  A4  when  referring  to  the  ttcA-SGLRT/ 
SALRT  and  ttcA^q-SGLRT/SALRT  algorithms,  respectively. 

Throughout  the  proofs,  we  use  the  specific  formula  for  the 
updated  posterior  probability  of  component  k  being  abnormal. 
Let  lfc(7?-)  be  the  probing  indicator  function,  where  Ifc(n)  = 

1  if  component  k  is  probed  at  time  n  and  Ifc(  n)  =  0  otherwise. 
Let  tm  be  the  time  when  the  decision  maker  starts  the  test. 
For  example,  assume  that  K  =  3  and  the  decision  maker  tests 
the  components  according  to  the  following  order:  3,  1,  2.  Then, 
ti  =  1  (when  the  test  starts),  t2  =  T3  -f  1,  fs  =  n  -f  1. 

Under  the  independent  model,  the  posterior  probability  of 
component  k  being  abnormal  can  be  updated  at  time  tm-i-i  as 
follows  [22]: 

2Ifc(fm+l)  ~  (1  f  fc(fm))  2rfc(fm) 

_| _ lfc(^m)2rfc(fm)/fc  {ykiNk)) _ 

{yk{Nk))  +  {l-TTk{tm})  {yk{Nk))  ' 

where  TTk{ti)  =  i^k  denotes  the  a  priori  probability  of  compo¬ 
nent  k  being  abnormal.  The  term  yk{Nk)  = 
denotes  the  A^^-size  vector  of  observations,  taken  from  compo¬ 
nent  k.  Under  the  exclusive  model,  'Kk{tm+i)  is  given  in  (26), 
shown  at  the  bottom  of  the  page.  Note  that  in  contrast  to  the 
independent  model,  under  the  exclusive  model  the  beliefs  of  all 
the  components  are  changed  at  each  time  due  to  the  dependency 
across  components.  The  posterior  probabilities  depend  on  the 
selection  rule  and  the  collected  measurements. 

A.  Proof  of  Theorem  1  Under  The  Exclusive  Model 

Let  E'(7Vfc|fLi,  t)  be  the  expected  sample  size  achieved  by  a 
stopping  rule  and  a  decision  rule  {rlf),  ^(.(f)),  depending  on 
the  time  that  component  k  is  tested  (i.e.,  5(,(t))  depend 

on  the  selection  rule),  such  that  error  constraints  are  satisfied. 
Let  'Ei^‘^{Nk\Hi)  be  the  expected  sample  size  achieved  by  the 
SPRT’s  stopping  rule  and  decision  rule  independent 

of  the  time  that  component  k  is  tested  (i.e.,  are  in¬ 

dependent  of  the  selection  rule),  such  that  error  constraints  are 
satisfied.  Clearly,  'E^^{Nk\Hi)  <  'E'{Nk\Hi,  t)  for  all  k,  t,  for 

2  =  0,  1. 

Step  1 :  Proving  the  theorem  for  if  =  2 : 


T^kitrn+l)  — 


_ 'i-k{tm)T^k{tm)fi^^  {ykjNk)) _ 

7rfc(fm)./fc^^  {yk{Nk))  +  (1  -  T^kitrn))  fk^  (yfe(^fe)) 

(1  -  lfc(im))7rfe(t™)/j°)^)  (y^(t„ 


-b- 


)) 


{ynt-m)  (y<fl(t„) 


(0) 


(26) 
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Assume  that 


7ri(ti)ci  ^  7r2(tl)c2 


(27) 


Consider  selection  rules  4‘^^\  that  select  component  1 
first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  cost  achieved  by 
{T'{t),6' (t),  0^^^)  is  given  by: 


^  I  ^  CfcTfcl{fcGHi}  I  {r\t),  6'{t),  I 

=  (E'(Ar2|i/i,ti))^2(il)c2 
+  (E'(Af2|7To,fi)  +  E'(A^i|77i,f2))^i(ir)ci.  (28) 


The  expected  cost  achieved  by  (t),  ^^^^)  is  given  by: 

E  I  ^  CkTkl{keH,}  \  (•’■'(i),  6'{t),  I 

=  (E'(iVi|£ri,ti))^i(ti)ci 

+  (E'(iVi|77o,fi)  +  E'(iV2|^i,f2))^2(L)c2.  (29) 


Note  that  the  expected  cost  achieved  by  both  selection  rules 
can  be  further  reduced  by  minimizing  the  expected  sample  sizes 
(such  that  error  constraints  are  satisfied)  independent  of  the  se¬ 
lection  rules,  which  is  achieved  by  Therefore,  an 

optimal  solution  must  be  (t"^^  ^-^2  ^(i)^  ^ 

Next,  we  use  the  interchange  argument  to  prove  the  theorem  for 
K  =  2.  The  expected  cost  achieved  by  ,  6^"^ .  4^^'^ )  is  given 

by: 

E|^c,rfcl{fcG«Nl 

lfc=i 

=  [Y.^yN2\Hi))T,2{h)c2 

+  ^i(ti)ci.  (30) 

The  expected  cost  achieved  by  is  given  by: 

E  |^Cfcrfcl{fcg2ri}l 

+  {E^yNi\Ho)  +  E^yN2\Hi))  7T2{h)c2.  (31) 

the  expected  cost  achieved  by  is  lower  than  that  achieved 
by  since  completes  the 

proof  for  itT  =  2. 

Step  2:  Proving  the  theorem  by  induction  on  the  number  of 
components  K : 

Assume  that  the  theorem  is  true  for  7T  —  1  components  (where 
one  and  only  one  component  is  abnormal).  Assume  that 

7ri(fi)ci  ^  7r2(fi)c2  ^  ^  7rg(^i)cic 

E^yNilHo)  -  E-42(At2|7/o)  -  •••  -  E^2(^^|^^)- 

(32) 

Consider  the  case  of  K  components  and  denote  as  an  op¬ 
timal  selection  rule  that  selects  component  j  first. 

Step  2.1:  Proving  the  theorem  for  the  last  K  —1  components: 


Next,  we  show  that  the  last  K  —  1  components  must  be  se¬ 
lected  in  decreasing  order  of  TTk{ti)ck /E"^2 (TVj,  |7Jg)  and  tested 
by  the  SPRT. 

Let 


7j(0 


1 


7r,(f) 


/f’(yiLVi)) 


+  1  -  TTj{t) 


(33) 


Note  that  when  the  decision  maker  completes  testing  component 
j,  the  other  components  update  their  beliefs  according  to: 


7rfc(i2)  =  lj{ti)Trk{ti),  Vfc  y  j.  (34) 


The  expected  cost  achieved  by  given  the  outcome  (at  time 
^2)  by  testing  component  j  (i.e.,  given  the  observations  vector 
yj{Nj))  is  given  by: 


=  T^At2)c-jN.j  -f  (1  -7r^-(t2)) 

I  k^l,k^j 


(35) 


Let 


h  =  Tk-  Nj  vfc  y  j 


(36) 


be  the  modified  stopping  time,  defined  as  the  stopping  time  from 
t  =  until  testing  of  component  k  is  completed.  Thus,  we 

can  rewrite  (35)  as: 


K 


[k^l  J 

K 

=  '^T^k{t2)CkNj  -f  (1  -  TTj{t2)) 


k=l 


fc  — 1,/e/j 


(37) 


The  term  Ylk=i  T^k{t2)ckNj  in  (37)  follows  since, 

Pr  (fceWil.^^^3  y,(iV,),jeWo) 

Pr  {k  G  Wi,  j  G  Wo|^^^\yj(Wj),) 
Pr(jGWo|</.^^'\y,(W,),) 

Pr  (j  G  ny<i>^^\.yyNy, )  i  - ^^(*2) 

(38) 


Minimizing 


^  CfcTfclifcer^i}  yj  (Wj 


(39) 
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at  time  t2,  requires  one  to  minimize 


E 


k—l.k^j 


(40) 


in  (37). 

Note  that  (40)  is  the  cost  for  Tt'  —  1  components  (where  one 
and  only  one  component  is  abnormal)  starting  at  time  t  =  t2  = 
Nj  +  1,  with  prior  probability  Trk{t2)  =  compo¬ 

nent  k  ^  j  being  abnormal.  By  the  induction  hypothesis,  for 
any  optimal  selection  rule  that  selects  component  j  first, 
arranging  the  last  K  —  1  components  in  decreasing  order  of 
)ffc(^2)cfc/E"^^(A^fc|i7o)  (and  testing  them  by  the  SPRT)  min¬ 
imizes  (40). 

Since 


7j(L) 

1  -  TTj{t2) 


T^k{tl) 


Vfc  J, 


(41) 


then 

iri(f2)ci  V2{t2)c2  ^j-l{t2)Cj-l 

-  E^^{N2\Ho) 

^  ifj  +  i(f2)cj  +  l  ^  ^  ^K{t2)cK 

-  E^2(7V,+i|i?o)  -  ■■■  -  E^2(Afi^|ffo)' 

(42) 

Thus,  the  last  K  —I  components  must  be  selected  in  decreasing 
order  of  7rfc(fi)cfc/E'^2(^7V^|7jjjj  and  tested  by  the  SPRT. 

Step  2.2:  Proving  the  theorem  for  all  the  K  components: 
Finally,  we  show  that  component  1  (i.e.,  the  component  with 
the  highest  index)  must  be  selected  first.  The  expected  cost 
achieved  by  is  given  by: 

E  CkTkl{k&n^}\{^^'{t), 

=  ^,(fi)c,(E'(Ar,|ffi,ti)) 

K 

“f  ^  ^ 

k^l.kjtj 

x[E'(iV,|£ro,fi)+[ 

+E^^iNk\H,))]  .  (43) 

First,  note  that  the  expected  cost  achieved  by 
can  be  further  reduced  for  all  j  by  minimizing  the  ex¬ 
pected  sample  size  E'(A^j|Ff.i,  ti)  for  i  =  0,1,  which  is 
achieved  by  Therefore,  an  optimal  solution  must 

be  (t"^2  ^^^2  ^  ^(i)  j  Pqj.  an  optimal  selection  rule  Thus,  in 
the  following  we  consider  solutions  of  the  form  (t'^2 ,8^'^  ,4>). 
Next,  by  contradiction,  consider  an  optimal  selection  rule 
^jjat  selects  component  j  ^  1  first.  Therefore, 
selects  the  components  in  the  following  order: 


As  a  result,  the  expected  cost  achieved  by  ,  8"^^ ,  is 

given  by: 


E  Cfcrfcljfcewi}  I  I 

=  7r,ih)c,  {E^^{N,\H,)) 


K 


4“  ^  ^ 

k=2,k^j 


k-1 


+-E^HN,\Hi)y  . 


(44) 


We  use  the  interchange  argument  to  prove  the  theorem.  Con¬ 
sider  a  selection  rule  that  selects  component  1  first  followed 

by  components  j,  2,  3,  j  —  1,  j  -F  1, . . . ,  if .  Similar  to  (44),  the 
expected  cost  achieved  by  (t"^2  ^  ^^2^  ^(i)^  given  by: 


'^CkTkl{keni}  \ 


.fc  =  l 


=  ^l(fl)ci 

+  ^,(fi)c,  [E^2(iVi|iio)  + 

K 

+  X]  ['^k{tl)ck 

k=2,kytj 


k-1 


+E^\Nk\H,)y . 


(45) 


By  comparing  (44)  and  (45),  it  can  be  verified  that: 

Kk^l 

since  7ri(ti)ci/E'^2^^^|7j^j  >  T^.[ti)cjl^^‘^{Nj\HQ). 

The  expected  cost  can  be  reduced  by  selecting  component 
1  first  followed  by  component  j,  which  contradicts  the  op¬ 
timality  of  Hence,  at  time  ti  selecting  component  1 

minimizes  the  expected  cost.  We  have  already  proved  that 
selecting  the  last  if  —  1  components  in  decreasing  order  of 
T^k{ti)ckl'Ei^‘^{Nk\HQ)  minimizes  the  objective  function, 
which  completes  the  proof  ■ 


B.  Proof  of  Theorem  1  Under  The  Independent  Model 

Let  E'(Afe|iii,  t)  be  the  expected  sample  size  achieved  by  a 
stopping  rule  and  a  decision  rule  {rlf),  8'j^{t)),  depending  on 
the  time  that  component  k  is  tested  (i.e.,  8'f.{t))  depend 

on  the  selection  mle),  such  that  error  constraints  are  satisfied. 
Let  E'^^(A^fc|i7i)  be  the  expected  sample  size  achieved  by  the 
SPRT’s  stopping  rule  and  decision  rule  independent 
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of  the  time  that  component  k  is  tested  (i.e.,  {t^^,  are  in¬ 
dependent  of  the  selection  rule),  such  that  error  constraints  are 
satisfied.  Clearly,  <  E'(7Vfc|ifi,  f)  for  all  k,t,  for 

'<  =  0,  1  and  are  achieved  by  the  TrciV-SPRT  algorithm. 

First,  consider  the  case  where  K  =  2.  Assume  that 

7ri(fi)ci  ^  7r2(fi)c2 
E^l(iVi)  -  E'41(A2)' 

Consider  selection  rules  that  select  component  1 

first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  cost  achieved  by 
{T'{t),S'{t),  is  given  by: 

U=i 

=  (E'(7V2|-ffi.ii))^2(ii)c2 
+  (E'(7V2|fi)  +  E'(A^i|Fi,  tz))  ^i(ii)ci.  (46) 

The  expected  cost  achieved  by  {T'{t),S' (t),  is  given  by: 

E  <  '^CkTkl{kEH,}  \ 

U=i 

=  (E'(7Vi|iTi,ii))^i(ii)ci 
+  (E'(7Vi|fi)  +  E'(A^2|i?i,  i2))  ^2(ii)c2.  (47) 


Note  that  the  expected  cost  achieved  by  both  selection  rules 
can  be  further  reduced  by  minimizing  the  expected  sample  sizes 
(such  that  error  constraints  are  satisfied)  independent  of  the  se¬ 
lection  rules,  which  is  achieved  by  ^).  Therefore,  an 

optimal  solution  must  be  )  or  ■, 

Next,  we  use  the  interchange  argument  to  prove  the  theorem  for 
K  =  2.  The  expected  cost  achieved  by  is  given 

by: 

=  (E^l(A2|ffl))7r2(il)c2 
+  (E^'(Ar2)  +  ^i(fi)ci.  (48) 

The  expected  cost  achieved  by  )  is  given  by: 

=  (E^i(iVi|£ri))7ri(ti)ci 
+  +  E^\N2\Hi))  7T2ih)c2.  (49) 

The  expected  cost  achieved  by  is  lower  than  that  achieved 

by  since  which  completes  the  proof 

for  K  =  2. 

The  rest  of  the  proof  follows  by  induction  on  the  number  of 
components,  as  was  done  under  the  exclusive  model.  ■ 


C.  Proof  of  Theorem  2 

For  every  k,  let  E*(A^fc|ifi)  be  the  minimal  expected  sample 
size  that  can  be  achieved  by  any  sequential  test,  such  that 
error  constraints  are  satisfied.  Let  E'^^(Afc|7Ti)  be  the  ex¬ 
pected  sample  size  achieved  by  the  ttcA-SGLRT/SALRT 


algorithm,  such  that  error  constraints  are  satisfied.  Clearly, 
E*(iVfc|ff.)  <  for  all  k,  fori  =  0,  1. 

Assume  that 


7ri(fi)ui  ^  7r2(fi)u2  ^ 
E*(7Vi)  -  E*(A2)  “ 


> 


T^K{tl)cK 
^*{Nk)  ■ 


(50) 


Similar  to  the  proof  of  Theorem  1,  it  can  be  verified  that  the 
optimal  solution  to  (2)  is  to  select  the  components  in  the  fol¬ 
lowing  order:  1,  2, . . . ,  A,  where  the  components  are  tested  by 
a  sequential  test  that  achieves  expected  sample  size  E* {Nk\Hi) 
for  all  k,  for  i  =  0,  1.  Therefore,  the  expected  cost  achieved  by 
(r* 


E 


.6*  is  given  by: 

'  K 

^  CkTkl{kGHi}\{T*  ,  S*  , 


<k=l 


K 


—  ^  ^  (G  )  Ufc 


fc=i 


/k-l 


^E*(7V0  +E*(iVfc|Hi) 


\i=l 


(51) 


By  the  asymptotic  optimality  property  of  the  SALRT/SGLRT 
for  a  single  process  (used  in  the  ttcA^-SGLRT/SALRT  algo¬ 
rithm),  it  follows  that  E"^^(Afc|iJi)  ~  E*(Afc|ffi)  for  all  k, 
for  i  =  0,  1  as  — >  0,  — >  0.  As  a  result,  for  suf¬ 

ficiently  small  error  probabilities,  the  solution 
is  to  select  the  components  in  the  following  order:  1, 2, . . . ,  if , 
where  the  components  are  tested  by  an  asymptotically  optimal 
sequential  test  that  achieves  expected  sample  size  E"^^(Afc|iJi) 
for  all  k,  for  *  =  0,  1.  Therefore,  the  expected  cost  achieved  by 
is  given  by: 


k-i  \ 


K 

^  ^  '^k  (G 

fc=l 


(52) 


Since  E^^{Nk\H,)  ~  E*(Afc|iJ,)  for  i  =  0,  1  as  F', 
0.  P^^  0  for  all  k,  the  theorem  follows. 


FA 


D.  Proof  of  Theorem  3 

The  structure  of  the  proof  is  similar  to  the  proof  of  Theorem 
2.  Flence,  we  provide  a  sketch  of  the  proof,  using  notation 
similar  to  that  used  in  the  proof  of  Theorem  2.  Similar  to 
the  proof  of  Theorem  1,  it  can  be  verified  that  the  optimal 
solution  to  (2)  is  to  select  the  components  in  decreasing 
order  of  Trk{ti)ck/E*{Nk\Ho),  where  the  components  are 
tested  by  a  sequential  test  that  achieves  expected  sample  size 
E*{Nk\Hi)  for  all  k,  for  *  =  0,  1.  By  the  asymptotic  opti¬ 
mality  property  for  a  single  process  of  the  SALRT/SGLRT 
(used  in  the  TrmVo-SGLRT/SALRT  algorithm),  it  follows 
that  ~  E*{Nk\Hi)  for  all  k,  for  /  =  0,  1  as 

P^^  — >  0,  P^^  0.  As  a  result,  for  sufficiently  small 

error  probabilities,  the  solution  ,  8^^^ ,  is  to  select 

the  components  in  decreasing  order  of  Fk{ti)ck/E*{Nk\Ho), 
where  the  components  are  tested  by  an  asymptotically  optimal 
sequential  test  that  achieves  expected  sample  size  E^‘^{Nk\Hi) 
for  all  k,  for  i  =  0,  1.  Similar  to  the  proof  of  Theorem  2, 
comparing  the  objective  functions  achieved  by  (t*  ,8*  .  ip*)  and 
8^'^,  proves  the  theorem.  ■ 
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